improve bin-packing performance #652

bwagner5 · 2021-08-31T20:27:04Z

Issue, if available:
#625

Description of changes:

Increased performance of bin-packing. This specific optimization found a loop where a terminable case was not being checked which allowed a significant number of loop iterations to be skipped.
- The loop was responsible for packing pods on to a specific packable node. Once the node was full, the loop would continue unnecessarily. This change also adds another small optimization to the check to determine if the node is full by passing through the smallest pod from the list (it's already sorted when passed thru) and adds it to the currently reserved resources in order to see if anymore progress could possibly be made by continuing through the loop.
The terminable case optimization in the packing loop had a 30x improvement
The smaller "min-pod" optimization resulted in a 50% improvement

Benchmarks:

20,000 pods previously took 91 seconds and now takes about 3 seconds and drastically reduces memory usage. NOTE: the benchmark graph "post-optimization" was ran before the minPod optimization which reduced the 20,000 pod case from around 6 seconds to 3 seconds. The benchmarks were executed on 2 m5.large instance types.

Testing:

Performed A/B testing w/ 1:1 input and both gave the same output. The specific test walked up cpu, memory, and pods to pack in increments of 100 milli-cores, 64mb, and 1 respectively. The upper limits were 10,000 mc, 8GB, and 90 pods respectively.
A/B tests on multiple node packings were also performed ad-hoc.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

netlify · 2021-08-31T20:27:10Z

✔️ Deploy Preview for karpenter-docs-prod canceled.

🔨 Explore the source changes: 03613d2

🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/612fc330d1a63300080bd7ee

pkg/packing/packer.go

ellistarn · 2021-08-31T20:35:10Z

pkg/packing/packable.go

 	}
 	return result
 }

+// isFull checks if the reserved resources + the minimum resourced pod overflows the total resources available.
+func (p *Packable) isFull(minPod *v1.Pod) bool {


Any reason you can't shared logic with

func (p *Packable) reservePod(pod *v1.Pod) bool { requests := resources.RequestsForPods(pod) requests[v1.ResourcePods] = *resource.NewQuantity(1, resource.BinarySI) return p.reserve(requests) } func (p *Packable) reserve(requests v1.ResourceList) bool { candidate := resources.Merge(p.reserved, requests) // If any candidate resource exceeds total, fail to reserve for resourceName, quantity := range candidate { if quantity.Cmp(p.total[resourceName]) > 0 { return false } } p.reserved = candidate return true }

As far as I understand it, this function does exactly what you need, you'd just need a dry-run mode or something.

Yeah, you're probably right that this can be reduced. Let me try to merge these. I think the dry-run would be the isFull and the reserve would still exist but do less and just call isFull and the reserve.

Actually, after digging into trying to merge these two or use each other, I think it's actually making the interface of reserve significantly worse off. The reserve function is doing something slightly different and returning a different result.

The reserve function will take a resource list and reserve on the node if it doesn't overflow the node's resource. The return value signifies that the reservation succeeded (did not overflow) or failed (overflowed).

The isFull function is only concerned about the node totals and seeing they are all the way full, but not overflowed OR would overflow with the smallest pod that needs to be reserved in the list.

I think merging these functions makes it less readable, adds complexity to reserve, and the return value gets muddied since we'd need two bools returns, one indicating if the reservation succeeded and one noting if the node is full.

ellistarn · 2021-08-31T20:38:34Z

pkg/packing/packable.go

@@ -100,10 +101,29 @@ func (p *Packable) Pack(pods []*v1.Pod) *Result {
 			return result
 		}
 		result.unpacked = append(result.unpacked, pod)
+
+		if p.isFull(pods[len(pods)-1]) {


Is it possible to move this check to the beginning of the loop? Unless I'm misunderstanding.

it can be moved to the top. I think it might make more sense to move after the reserve block though.

Here's how I reason in english about the func:

Try to reserve the space for the pod, if that fails, then check if the node is full and potentially exit the loop. If the node isn't full (next block) and nothing has been packed (meaning the largest pod was unable to be packed), exit the loop because we need to find a bigger node. If we've made progress, but the current node is just getting full, add the items to the unpacked list and keep trying smaller pods.

pkg/packing/packable.go

JacobGabrielson · 2021-08-31T21:07:25Z

Nice!

pkg/controllers/allocation/binpacking/packable.go

ellistarn · 2021-09-01T01:17:14Z

Amazing!

ellistarn reviewed Aug 31, 2021

View reviewed changes

pkg/packing/packer.go Outdated Show resolved Hide resolved

ellistarn reviewed Aug 31, 2021

View reviewed changes

JacobGabrielson assigned JacobGabrielson and unassigned JacobGabrielson Aug 31, 2021

JacobGabrielson self-requested a review August 31, 2021 20:42

JacobGabrielson reviewed Aug 31, 2021

View reviewed changes

pkg/packing/packable.go Outdated Show resolved Hide resolved

bwagner5 added 2 commits August 31, 2021 16:09

improve bin-packing performance

a6ee8d7

use must rather than should in comment

6a2d3e0

bwagner5 force-pushed the bench-bpp branch from a0b6b9e to 6a2d3e0 Compare August 31, 2021 21:09

move node fullness check

7815163

ellistarn reviewed Sep 1, 2021

View reviewed changes

pkg/controllers/allocation/binpacking/packable.go Outdated Show resolved Hide resolved

ellistarn reviewed Sep 1, 2021

View reviewed changes

pkg/controllers/allocation/binpacking/packable.go Outdated Show resolved Hide resolved

ellistarn previously approved these changes Sep 1, 2021

View reviewed changes

rename some things

03613d2

bwagner5 dismissed ellistarn’s stale review via 03613d2 September 1, 2021 18:15

ellistarn approved these changes Sep 1, 2021

View reviewed changes

bwagner5 merged commit a5db3dc into aws:main Sep 1, 2021

gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023

Add GHA linting (aws#652)

44503bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve bin-packing performance #652

improve bin-packing performance #652

bwagner5 commented Aug 31, 2021

netlify bot commented Aug 31, 2021 •

edited

Loading

ellistarn Aug 31, 2021 •

edited

Loading

bwagner5 Aug 31, 2021

bwagner5 Aug 31, 2021 •

edited

Loading

ellistarn Aug 31, 2021

bwagner5 Aug 31, 2021

JacobGabrielson commented Aug 31, 2021

ellistarn commented Sep 1, 2021

improve bin-packing performance #652

improve bin-packing performance #652

Conversation

bwagner5 commented Aug 31, 2021

netlify bot commented Aug 31, 2021 • edited Loading

ellistarn Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

bwagner5 Aug 31, 2021

Choose a reason for hiding this comment

bwagner5 Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

ellistarn Aug 31, 2021

Choose a reason for hiding this comment

bwagner5 Aug 31, 2021

Choose a reason for hiding this comment

JacobGabrielson commented Aug 31, 2021

ellistarn commented Sep 1, 2021

netlify bot commented Aug 31, 2021 •

edited

Loading

ellistarn Aug 31, 2021 •

edited

Loading

bwagner5 Aug 31, 2021 •

edited

Loading